This document summarizes a technology presentation about the IBM Smart Analytics Optimizer (ISAO). ISAO is a database accelerator that uses large main memories, commodity hardware, and extreme data compression to speed up typical data warehouse and business intelligence SQL queries by 10-100x without requiring tuning of indexes or materialized views. The presentation covers ISAO's market opportunities in business intelligence, its architecture as a network-attached accelerator to IBM's Informix database, and its disruptive query processing techniques that achieve predictably fast performance for ad hoc queries.
13. IBM Partner testing their customer warehouse. ISAO Accelerates Most the Longest-Running Informix Queries Average Speed-up = 116x 1 hour
14.
15.
16.
17.
18. Informix ISAO Appliance 2. Datamart Definition IBM Optim Data Studio 1. Identify the datamart to offload. 4. Create the metadata 5. Issue Off-load Datamart command Datamart and Data off-loading from Informix to ISAO 3. Return the SQL representation 6. Off-load the data 9. Return ACK 7. Distribute the data among blades 8. Compress the data
42. Thank You Merci Grazie Gracias Obrigado Danke Japanese English French Russian German Italian Spanish Brazilian Portuguese Arabic Traditional Chinese Simplified Chinese Hindi Tamil Thai Korean
Editor's Notes
Explaining the flow from Defining the mart within the GUI (shown in more detail on the following slide) Deploying the definitions to IDS Starting the LOAD and transformation of the data into DWA
Large objects (LOBs), user defined types, complex types in Informix. > 225 tables > 750 columns in an Accelerator Query Table (AQT) Certain functions not supported in V1 of ISAO: Mathematical functions such as SIN, COS, TAN, EXP, and CORRELATION User-Defined Functions Advanced string functions such as LOCATE, LEFT, OVERLAY, and POSITION Advanced OLAP functions lsuch as RANK, DENSE, ROW NUMBER, ROLLUP, and CUBE Self-joins or cycles in the join graph
ISAO stores columns in groups, or vertical partitions of the table, called “banks”. The fraction of a row (or tuple) that fits in each bank is called a “tuplet”. The assignment of columns to banks is cell-specific, because the column lengths vary from cell to cell. The assignment uses a bin-packing algorithm that is based upon whether the column fits in a bank, whose width is some fraction of a word, rather than usage in a workload. In this way, multiple tuplets can be assembled in a register side-by-side so that multi-row parallelism is also achieved in SIMD (single-instruction, multiple data) fashion. Also, scans need only access the banks that contain columns referenced in any given query, saving scanning those banks with no columns referenced in the query. This projection is similar to the way pure column stores avoid disk I/Os to great advantage. The savings in ISAO aren’t as dramatic because there are no disk I/Os with ISAO anyway (remember, it’s a main-memory database!), but it still saves considerable CPU. All the banks for a set of rows are grouped into a large (1 MB) block called a “cell block”. This organization is known as the PAX organization, as detailed in a paper by Ailamaki et al.